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One alternative to improve feed quality is to combine the main feed with 
forages which are more economical in cost but contain high protein sources, 
such as sorghum. Production estimation is essential because it will determine 
the sustainability of the feed. This study aimed to estimate the amount of 
sorghum production using support vector regression (SVR). Several stages of 
this research are collecting data, preprocessing, modelling, and evaluation. 
The dataset used and the input for this SVR algorithm model is field 
observation data. The kernels used in the SVR algorithm modelling are linear, 
Polynomial, and RBF. Sorghum production estimation using SVR has a 
performance evaluation value that refers to the root mean square error 
(RMSE). The result of this research is that the model obtained from the SVR 
algorithm can estimate sorghum production with performance evaluation 
values using R2, mean absolute error (MAE), mean absolute percentage error 
(MAPE), and RMSE. The best results on the Polynomial kernel are 


R2=0.7841, MAE=0.0681, MAPE=0.46641, and RMSE=0.1006. This study 
shows that the classification model obtained from the SVR algorithm with 
Kernel Polynomial is the best model for estimating sorghum production. 
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1. INTRODUCTION 

In the livestock industry, a significant concern is the availability of land, livestock and feed. The main 
feed for the livestock industry is grass, but the grass has high fibre and low protein content for livestock. The 
grass needs to be mixed with concentrates such as dregs, corn, and other similar foods to add nutritional value 
to the feed. It will cause production costs to increase. Abdullah and Suharlina [1], an alternative source of high 
forage protein but at an economical cost is to combine the main feed with types of legumes such as sorghum. 
Sorghum has been introduced and cultivated in Indonesia, particularly in dry and marginal areas [2], and it is 
a universal multipurpose crop for food, fodder, and potential biofuel feedstock [3]. 

The problem in this study is that to estimate the production of feed biomass for ruminants is sufficient 
or more or less as a feed production material to combine forage with the main feed. Therefore it is necessary 
to estimate the biomass of sorghum yields. In the industrial era 4.0, information technology has become 
necessary in various fields, including agriculture and animal husbandry. One of the uses of this information 
technology is the application of machine learning algorithm models, namely computer programming to 
optimize performance using history data [4]. Liakos et al. [5], in general, the machine learning methodology 
involves a learning process to learn from training data in carrying out tasks. It is stated by Ghosal et al. [6] in 
the national strategy for artificial intelligence in Indonesia that increasing forecasting accuracy with machine 
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learning will help farmers plan agricultural cycle activities. Several previous studies by BPPT [7] and 
Masjedi et al. [8] used machine learning algorithms for research on sorghum plants. Several studies related to 
predictions in agriculture and animal husbandry are wheat yield prediction [9], crop disease prediction [10] 
and [11]. One widely used algorithm for forecasting or predicting target values is the support vector machine 
(SVM), for example, in the medical field [12], analogue circuit [13], education [14], face recognition [15] and 
also in the agriculture [16]. For solving the SVM regression case, it is modified to the support vector regression 
(SVR) algorithm [17]. SVR aims to find a hyperplane to predict the training data set and the optimal value of 
the parameters obtained through the GridSearch method. The grid search method tests a model to find the error 
value in the classification [18]. The parameters determined by the optimal value are epsilon (), cost (C), and 
gamma (vy). In the SVR method, several choices of kernel functions can be used, such as linear, Gaussian, 
Polynomial, and several other kernels. SVR builds a hyperplane in high dimensional space in linear or nonlinear 
data and can overcome overfitting [19]. Several studies using SVR in agriculture and animal husbandry, such 
as soil erosion susceptibility prediction [20], predicting forage quality of warm-season legumes [21], crop 
model of rice production [22], Water stress detection [23] and also Rumex and Urtica detection in grasslands 
[24]. In this study, it is hoped that the machine learning SVR algorithm models will be widely used to predict 
or estimate the amount of sorghum biomass production. This study will look for the best kernel function of the 
three kernels and the best parameters using the GridSearch method from the SVR model to estimate the biomass 
of animal feed sorghum production. 


2. METHOD 

This research methodology has five stages: a preliminary study, data collection, preprocessing, 
modelling, and model evaluation. Each step in this research flow cycle diagram is as shown in the research 
flow chart Figure 1. The explanation of each stage of this research carried out is as: 

— Data collection: This study uses the sample data taken from direct observations in the sorghum field in the 
sorghum bicolor block cv. Samurai-2. The research area is Jonggol Animal Science Teaching and Research 
Unit JASTRU), Singasari Village, Jonggol District, Bogor City. The dataset was taken at harvest time on 
March 8, 2021, with as many as 88 plant data with attributes shown in Table 1 and sample data from the 
field shown in Table 2. 
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Figure 1. Research steps 
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Table 1. Dataset sorghum of bicolor cv Samurai-2 


No Latitude Longitude Stem Stem Leaves Seed Seed Biomass 
sample height diameter height width 

6 -6.46875233 107.01083981 170 14 9 19 4 204 
12 -6.46885800 107.01121700 224 25 15 24 5 677 
18 -6.46867571 107.01117843 125 11 10 22 4 147 
27 -6.46846183 107.01120257 219 16 10 27 7 291 
38 -6.46855311 107.01108322 219 23 11 32 9 574 
39 -6.46838988 107.01111775 217 20 12 25 5 381 
42 -6.46840753 107.01109931 177 19 11 19 4 341 
43 -6.46847116 107.01105975 219 20 10 28 6 480 
44 -6.46856444 107.01105069 150 15 10 18 3 156 
47 -6.46861375 107.01098096 232 19 11 29 6 306 
48 -6.46849082 107.01104090 234 22 12 30 8 554 
50 -6.46826595 107.01107886 164 15 9 26 5 231 
5 271 


52 -6.46839654 107.01091088 215 16 13 26 


Table 2. Attribute dataset sorghum of bicolor cv Samurai-2 


No Attribute Description 
1 NO_SAMPLE No samples, A and B indicate growing two plant stems in one plant location 
2 LATITUDE Latitude 
3 LONGITUDE Longitude 
4 STEM_HEIGHT Plant stem height in meters 
5 STEM_DIAMETER Plant stem diameter in millimeters 
6 LEAVE Number of leaves 
ad SEED_HEIGHT Sorghum seed length in centimeters 
8 SEED_WIDTH Sorghum seed width in centimeters 
9 BIOMASS Overall weight in grams (target class) 


-  Preprocess: The attributes of the dataset from field obtained recording as shown in Table 1. The target 
attribute is WEIGHT, which is the plant's total weight (gram). All selected features are normalized 
between 0 and | before input into the SVR algorithm model. 

- Modeling: At this stage, the preprocessed dataset will be input into each SVR algorithm with a different 
kernel function, namely the SVR algorithm with a linear kernel function, the SVR algorithm with a 
gaussian kernel function, and the SVR algorithm with a polynomial kernel function. Several parameter 
values are identified to produce the best results from each kernel function in the modelling and evaluation 
process. The model validation process uses k-fold cross validation. 

- Evaluation: This stage is the output of the modelling process using the validation process. That is to 
compare actual and predicted values with a low error rate from every kernel function of the SVR. That 
model selected will be recommended for the prediction of sorghum production. Validation of all kernel 
functions is using R-Squared, mean absolute error (MAE), mean absolute percentage error (MAPE), and 
root mean squared error (RMSE). The achievement of this final stage indicator is that the resulting model 
has a minimum error value. 


3. RESULTS AND DISCUSSION 

Data correlation analysis on dataset attributes from collecting selected sorghum research sample data 
at harvest time, namely STEM_HEIGHT, STEM_DIAMETER, LEAVE, SEED_HEIGHT, SEED_WIDTH, 
and BIOMASS, there is very high correlation data. Also, data with weak correlation and visually correlation 
data between these attributes are shown in Figure 2. The highest data correlation was on the seed or panicle 
width attribute (seed_width) with the seed or panicle length attribute (seed_height), which was 0.92. The 
number of leaves (leaves) with a length is a weak correlation of a collection of seeds or panicles (seed_height) 
of 0.35. The preprocessing, modelling, and evaluation stages are carried out using the Python 3.7 programming 
language. Python using the skit-learn library and several main class packages for vector regression and cross- 
validation. 

In the preprocessing stage, scalar normalization was carried out to convert the value proportionally in 
each attribute between 0 to 1. The preprocessed dataset is the input of the SVR algorithm with a vector 
regression package at the modelling stage. The output of the modelling process is validated, and the results are 
evaluated at the evaluation stage. The standard value of a suitable validation parameter in using the cross- 
validation method specified to determine accuracy is ten times [25]. The SVR model will construct a 
hyperplane in high dimensional space in the nonlinear data shown in (1). 
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(xi) = (w.x;) + b (1) 
Where: 
f(x) = predictive value 
x = data 
W = weight 


b = bias value, (also represented by) 

Estimation for the coefficients w and b through the risk function and with Iwl as the normalization of 
the function to minimize it to produce a function close to flat, FE, is an ¢-insensitive loss function. The 
coefficient C value is defined by the user (trade-off) between the thin distance of the f function and the value 
above the upper limit deviation, which can still be tolerated as shown in (2) [26]. 


RF) =k wiP+C YE Be(yi — fxd) (2) 

fi) = Lier — a K(x) +b (3) 

Smola and Schélkopf [27] from (2), determining the parameters w and b become an optimization 
problem using the Lagrangian. The final equation for determining predictions with SVR is shown in (3), where 
are the Lagrange multiplier and the selected kernel function. Several kernel functions to handle nonlinear data 
cases are often used in SVR models, such as linear kernel in (4), polynomial kernel in (5), and Gaussian or 
Gaussian radial basis function (RBF) kernel in (6). 

K(%j,x) = (x, x") (4) 

K(x, x) = (y < x,x' > +r) (5) 
Where d is the degree parameter and r is the coefficient. 

K(x;,x) = exp (-y Il x — x’ ||)2 (6) 


Where is the gamma parameter must be greater than 0. 


-10 
STEM_HEIGHT - 

09 

STEM_DIAMETER 
08 

LEAVES 
- 0.7 

D_HEIGHT 
SEED_HEIG - 
SEED_WIDTH 05 
BIOMASS 1 04 


LEAVES 


' 
w 
4) 
< 
= 
2 
fea] 


SEED WIDTH 


STEM_HEIGHT 
STEM_DIAMETER 
SEED HEIGHT 


Figure 2. Dataset correlation 


The complete grid search process time is very long. Therefore, Hsu and Lin [18] recommends 
performing the grid search through the loose grid stage for selecting C and values, then proceeding with the 
finer grid stage to get a value around the C value that has been obtained with the lowest error value previously. 
The search for parameters C and epsilon on the SVR model in each kernel is done using GridSearch with a 
combination of parameter values tested, namely C = 0.01, 0.1, 1, 100, 1000 and epsilon parameters = 0.0001, 
0.0005, 0.001, 0.005, 0.01, 0.05, 0.1 , 0.5, 1, 5, 10. 
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The modelling process with a linear kernel obtained the best parameters for C and epsilon through the 
GridSearch method were C=1 and epsilon=0.0005, resulting in predictive values compared to real or actual 
values, as the example shown in Table 3, and graphically visualization is shown in Figure 3. Then the modelling 
process with a polynomial kernel obtained the best parameters for C, degree, and epsilon through the 
GridSearch method were C=1, degree=2, and epsilon=0.001, which resulted in predictive values compared to 
real or actual values as the example shown in Table 4, and graphically visualization is shown in Figure 4. 


Table 3. Real data vs. linear predicted biomass 
Real/Actual__ Linear predicted 


0.6251 0.5577 
0.2691 0.3145 
0.2912 0.4388 
0.4714 0.4446 


Real vs Predicted SVR Linear Kernel 


— Real 
—— Linear Predicted 


Figure 3. Graph of real data vs. linear predicted biomass 


Table 4. Real data vs. polynomial predicted biomass 
Real/Actual _ Polynomial Predicted 


0,6251 0,5725 
0,2691 0,2869 
0,2912 0,4175 
0,4714 0,4734 


Furthermore, the modelling process with the RBF kernel obtained the best parameters for C, epsilon, 
and gamma through the GridSearch method were C=100, epsilon=0.01, and gamma=0.1, which resulted in 
prediction values compared to real or actual values as the example shown in Table 5 and graphically 
visualization is shown in Figure 5. 


Table 5. Real data vs. linear Gaussian RBF biomass 
Real/Actual | RBF Predicted 


0,6251 0,5813 
0.2691 0.2846 
0,2912 0.3591 
0,4714 0,4767 
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Figure 4. Graph of real data vs. polynomial predicted biomass 


Real vs Predicted SVR RBF Kernel 
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Figure 5. Graph real data vs Gaussian RBF predicted biomass 


Measurement of accuracy and measurement of the error value between the predicted value and the 


real or actual value at the evaluation stage for each model result for each kernel uses the R-squared accuracy 
measurement method in (7) [28]. 


SSER 
R= 


SSEM 


(7) 
Meanwhile, for measuring the error value between the predicted value and the real or actual value at the 
evaluation stage for each model result, each kernel uses the MAE error measurement method [29] in (8), MAPE 
in (9), and RMSE in n (10) [28]. 


MAE = -Yi(y*—y) 


(8) 
MAPE = yr" — y) x100 (9) 
RMSE= /* 320" — y)? (10) 
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The predicted value and the real or actual value of the modelling results from each kernel are entered into the 
MAE, MAPE, and RMSE measurement equations to produce error measurement values, shown in Table 6. 


Table 6. Metric comparison result 


Linear Kernel Polynomial Kernel Gaussian RBF Kernel 
R-squared 0.6861 0.7841 0.7751 
MAE 0.0836 0.0681 0.0715 
RMSE 0.1173 0.1006 0.1055 
MAPE 55.1891 46.641 48.7648 


SVR with Polynomial Kernel has the smallest MAE, MAPE, and RMSE error measurements 
compared to the linear kernel and the Gaussian RBF kernel. Likewise, with the R-squared value of the 
polynomial kernel compared to the linear kernel and Gaussian kernel, the polynomial kernel has the largest R- 
squared value, so based on the research that has been done that the Polynomial Kernel is a good Kernel on the 
SVR method for estimating the production of forage sorghum (sorghum bicolor) cv. Samurai-2. 


4. CONCLUSION 

In this study, to obtain an SVR model for estimating the biomass of animal feed sorghum, we tried 
the SVR model with a combination of linear kernel, polynomial kernel, and gaussian RBF kernel. Each of our 
kernels looks for the best parameters using the GridSearch method function. The result is that the SVR model 
using the Polynomial Kernel kernel function with parameters C=1, degree=2, and epsilon=0.001 has the lowest 
error value and the highest coefficient of determination. Thus, SVR with a polynomial kernel function can be 
recommended to estimate the biomass of sorghum bicolor cv Samurai-2. The prospect of developing research 
results and implementing further research in the future can use other kernels such as Splines, B-Splines, and 
others. 
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